Neural Networks – Feedforward Math

In this post, we will do the math on our dummy dataset and calculate the feedforward steps by hand. We will take the parameters of our first instance, i.e. first house, as the input vector and arbitrarily chosen random weights. Our dummy dataset was as follows:

INSTANCESQFTNUM_BEDNUM_BATH
house 1 1500 2 2
house 2 1700 3 2
house 3 1750 3 3
dummy housing data

Let’s recall how the feature vector and weights are multiplied to get the input to hidden layer which will be the values of our hidden nodes.

figure 1


Let’s see what goes into the calculation for hidden neuron 1.

figure 2



First we take the dot product of our input vector and their respective weights. The weights in the first column of our weight matrix correspond to the weights of the first layer. Our weight matrix has rows equal to the length of the feature vector and columns equal to the number of hidden nodes.

x_1 \mathbf{.}w_{11} + x_2 \mathbf{.}w_{21} + x_3 \mathbf{.}w_{31}

now, we substitute the values as seen in fig 2,
= 1500 * 0.10 + 2 * 0.20 + 2*0.30
= 151

Our sigmoid activation is defined as, \sigma = \frac{1}{1 + e^{-x}}
Applying sigmoid to this value we get:
\sigma (151) = 1

An activation function decides whether to ‘fire’ the neuron or not. Here we have used sigmoid activation which gives a value between 0 and 1. Other activation functions can also be used, for example- ReLU.
Mathematically, sigmoid is defined as:
\sigma = \frac{1}{1 + e^{-x}} , as seen above.

figure 3



Calculating for node h_2 ,
x_1 \mathbf{.}w_{12} + x_2 \mathbf{.}w_{22} + x_3 \mathbf{.}w_{32}
= 1500*0.15 + 2*0.25 + 2*0.35
= 225 + 0.5 + 0.7
= 226.2
\sigma (226.2) = 1

Calculating for node h_3 ,
x_1 \mathbf{.}w_{13} + x_2 \mathbf{.}w_{23} + x_3 \mathbf{.}w_{33}
= 1500*0 + 2*0.50 + 2*0.80
= 0 + 1 + 1.6
= 2.6
\sigma (2.6) = 0.93

Calculating for node h_4 ,
x_1 \mathbf{.}w_{14} + x_2 \mathbf{.}w_{24} + x_3 \mathbf{.}w_{34}
= 1500*0.20 + 2*0.75 + 2*0.5
= 300 + 1.5 + 1
= 302.5
\sigma (302.5) = 1


Now we have the values for our four hidden neurons and we move on to calculate the input to the output neuron. We do not apply an activation on the output neuron of a regression task, so here will just do the dot product and sum it all up.
h_1 \mathbf{.}w_{11} + h_2 \mathbf{.}w_21 + h_3 \mathbf{.}w_31 + h_4 \mathbf{.}w_{41}
= 1*0.20 + 1*0.15 + 0.93*0.10 + 1*0.25
= 0.20 + 0.15 + 0.093 + 0.25
= 0.693

figure 4



The prediction in the output layer is 0.693. Now, how is this possible? $0.693 for a house?? Yes, your doubts on my calculations are correct and now let’s see why this is. Let’s not lose sight of the fact that this not our final output, it’s rather our first iteration. Here, our predicted label and output label are compared, then we will use this difference in our back-propagation step to find partial derivatives and distribute this error.

After the back-propagation step, the network will learn the weights and gradually adjust to larger values to predict a better and more accurate number by the time we finish all our iterations.

In this example, we are only doing one hidden layer but we can also have more. The number of layers also affects the efficiency of our model so we need to choose this number carefully, neither too big nor too small. If we would have had a second hidden layer, the values of the hidden neurons in layer 1 would have been the input of the neurons of the hidden neurons in layer 2 after dot product and activation and the calculation would be similar to that wee did above.

Last words:

– The data can also be scaled and standardized so there is not so much difference between feature values. There are several blogs about scaling data, I would advise you to read up on it.
– Other activations like ReLU can improve accuracy. ReLU activation is mostly preferred as it prevents the neurons from dying.
– To keep things simple, I haven’t used the bias neuron but it can also be used. It will help the model learn faster if the bias unit is initialized to a value in the thousands, say 50,000.
– I would also encourage you to try machine algorithms for this task and not just neural nets.

Helpful Resources:
1- The most important resource for this blog for me was Kaggle discussions. I would like to mention Ryan Holbrook for helping me in the Learn section of Intro to Deep Learning Course Discussion.
2- Matt Mazur’s blog Step-by-step backpropagation example.
3- Hands-on ML by O’Reilly publications.
4- Andrew Ng’s description of topics. This is my go-to whenever I want to develop intuition for anything.
5- Connect with people on social media, join slack channels, attend study sessions. Having a community will help in ways you don’t even know yet.

And many more random youtube and google searches.

Coming next: Neural Networks – Backprop Math

Thank you for your time!!! All input is welcome, constructive criticism is deeply appreciated. Do let me know how you feel about the approach, the content and, well, basically everything. 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s