Often after completing a task we feel that doing it another way would have been more efficient had it occurred to us earlier. Well, luckily for our neural nets, they can learn all these different ways to ‘start over’ and get efficient and accurate. They do this by backpropagation which means retracing the steps from output back to the input, checking and adjusting our weights on the way back and then starting the feedforward process again to give a new, and hopefully more accurate, output.
Things to know:
1- Error
2- Learning rate
3- Intuition for derivatives (Math covered in the next post)
4- Dot product
In the feed-forward step, we saw our input go through the network and the result was predicting the price of a house. Now that we have our prediction, we need to check how accurate it is. To do this we will calculate the difference between the actual value and prediction. For example, if instance 1 is valued at $100,000 and we get a prediction of $150,000, our difference would be $50,000 and means that the prediction is inaccurate. This difference between prediction and actual label is the error. We will use this error, distribute it back to all the weights, find out which weight is contributing most to the error and update accordingly.
When I first studied this, I had many questions and the most important one was- HOW, how is it possible?? To understand this how, it is important to understand what partial derivatives do, not how they are calculated, rather what they are telling us. So, what do partial derivatives tell us? Partial derivatives tell us how much change in the weight will cause the error to change. This helps us know which weights contribute more to the output and thus, which weights to change and in what measure the weight should be changed. All these partial derivatives result in a gradient which is a vector with partial derivatives of error with respect to the weights. The partial derivative is denoted by ∂ and the gradient is denoted by ∇.
Difference between gradient and partial derivative?
Partial derivatives are taken with respect to the error and particular weight, whereas, the combined partial derivatives of error with respect to all the weights gives us the gradient.
After this step, we use our learning rate which is most commonly denoted by alpha, α. Learning rate controls how much the weight changes. The gradient is multiplied by the learning rate and then added to the original weight to obtain the updated one. In a similar fashion, we travel all the way back to our input layer, updating weights each step of the way. With the new weights we have a new network and we start the feedforward process again to obtain a new output and this is repeated until our predicted output is closest or equal to our label.
Below is a drawing showing the first weights in both weight matrices. The weights of the first weight matrix, from input to hidden layer, are denoted by w1ij and weights of the second weight matrix, from hidden layer to output, are denoted by w2ij .

Note: The equation above has a negative sign, but as I mentioned we’re supposed to add this term; this is because the partial derivative is going down in the negative direction and has a negative sign and the learning rate is between 0 and 1, i.e. positive. The negative sign in the equation is because of the partial derivative.
To review, we first calculate the error. Then we take the partial derivative of error with respect to weight, this gives us how much the weight is contributing to the error. We multiply the gradient by the learning rate and then add it to the weight; this gives us the value by which to change the weight. With new weights, we have a new network and we begin again with the feedforward step. We continue to go back and forth until the weights are fully optimized and we get an accurate output.
Tips:
1- If you’re new to neural nets, this is a good time to strengthen your calculus if it’s weak.
2- Intuitive understanding and technical understanding are both important, my suggestion would be to first gain intuition and then work on the math.
3- Drawing figures and visually trying to understand helps a lot.
4- Whatever your learning style, there is a resource out there that caters to you; you just need to peruse the web enough to find the right fit.
5- If you’re also new to programming, regular practice will help you learn better. Fluency in programming also gives you an intuitive understanding of the code.
Helpful Resources:
1- 3blue1brown on youtube. Literally every person’s suggestion and trust me, they’re not exaggerating.
2- The Coding Train playlist on everything neural networks. I haven’t watched the whole playlist but explanations are simple and very easy to understand. The code is in javascript so I skipped that part.
3- Andrew Ng’s description of topics. This is my go-to whenever I want to develop intuition for anything.
4- Hands-on ML by O’Reilly publications.
5- Connect with people on social media, join slack channels, attend study sessions. Having a community will help in ways you don’t even know yet.
[I will try to add to the list as and when I come across more resources.]
Coming next: Neural Networks – Feedforward Math
Thank you for your time!!! All input is welcome, constructive criticism is deeply appreciated. Do let me know how you feel about the approach, the content and, well, basically everything. 🙂